580 research outputs found

    A clustering algorithm for multivariate data streams with correlated components

    Get PDF
    Common clustering algorithms require multiple scans of all the data to achieve convergence, and this is prohibitive when large databases, with data arriving in streams, must be processed. Some algorithms to extend the popular K-means method to the analysis of streaming data are present in literature since 1998 (Bradley et al. in Scaling clustering algorithms to large databases. In: KDD. p. 9-15, 1998; O'Callaghan et al. in Streaming-data algorithms for high-quality clustering. In: Proceedings of IEEE international conference on data engineering. p. 685, 2001), based on the memorization and recursive update of a small number of summary statistics, but they either don't take into account the specific variability of the clusters, or assume that the random vectors which are processed and grouped have uncorrelated components. Unfortunately this is not the case in many practical situations. We here propose a new algorithm to process data streams, with data having correlated components and coming from clusters with different covariance matrices. Such covariance matrices are estimated via an optimal double shrinkage method, which provides positive definite estimates even in presence of a few data points, or of data having components with small variance. This is needed to invert the matrices and compute the Mahalanobis distances that we use for the data assignment to the clusters. We also estimate the total number of clusters from the data.Comment: title changed, rewritte

    Achieving Constraints in Neural Networks: A Stochastic Augmented Lagrangian Approach

    Full text link
    Regularizing Deep Neural Networks (DNNs) is essential for improving generalizability and preventing overfitting. Fixed penalty methods, though common, lack adaptability and suffer from hyperparameter sensitivity. In this paper, we propose a novel approach to DNN regularization by framing the training process as a constrained optimization problem. Where the data fidelity term is the minimization objective and the regularization terms serve as constraints. Then, we employ the Stochastic Augmented Lagrangian (SAL) method to achieve a more flexible and efficient regularization mechanism. Our approach extends beyond black-box regularization, demonstrating significant improvements in white-box models, where weights are often subject to hard constraints to ensure interpretability. Experimental results on image-based classification on MNIST, CIFAR10, and CIFAR100 datasets validate the effectiveness of our approach. SAL consistently achieves higher Accuracy while also achieving better constraint satisfaction, thus showcasing its potential for optimizing DNNs under constrained settings

    From Noisy Point Clouds to Complete Ear Shapes: Unsupervised Pipeline

    Get PDF
    Funding Information: This work was supported in part by the European Union’s Horizon 2020 Research And Innovation Programme through the Marie Skłodowska-Curie Project BIGMATH, under Agreement 812912, and in part by the Eureka Eurostars under Project E!11439 FacePrint. The work of Cláudia Soares was supported in part by the Strategic Project NOVA LINCS under Grant UIDB/04516/2020. Funding Information: This work was supported in part by the European Union's Horizon 2020 Research And Innovation Programme through the Marie Skiodowska-Curie Project BIGMATH, under Agreement 812912, and in part by the Eureka Eurostars under Project E11439 FacePrint. The work of Claudia Soares was supported in part by the Strategic Project NOVA LINCS under Grant UIDB/04516/2020. Publisher Copyright: © 2013 IEEE.Ears are a particularly difficult region of the human face to model, not only due to the non-rigid deformations existing between shapes but also to the challenges in processing the retrieved data. The first step towards obtaining a good model is to have complete scans in correspondence, but these usually present a higher amount of occlusions, noise and outliers when compared to most face regions, thus requiring a specific procedure. Therefore, we propose a complete pipeline taking as input unordered 3D point clouds with the aforementioned problems, and producing as output a dataset in correspondence, with completion of the missing data. We provide a comparison of several state-of-the-art registration and shape completion methods, concluding on the best choice for each of the steps.publishersversionpublishe

    Cost Optimization of Ice Distribution

    Get PDF
    Two questions regarding minimizing fuel costs while delivering ice along a pre-set route are tackled. The first question is when demand exceeds the load of a single truck, so that a second truck of ice has to be taken to some point of the route for the driver/salesman to continue with that for the rest of the route: Is it better: 1) for the first truck to deliver starting from the costumer nearest to the base, or 2) for the first truck to start the delivery from the last costumer (the most distant from the base)? We show that the second strategy was better for the particular data looked at, and we have the basis of an algorithm for deciding which strategy is the better for a given delivery schedule. The second question concerns how best to modify a regular sales route when an extra delivery has to be made. Again, the basis for an algorithm to decide how to minimize fuel costs is derived
    • …
    corecore